Skip to content

Conversation

@tru
Copy link
Collaborator

@tru tru commented May 12, 2025

The current parser just checks one step back for a R before each string to know that it's a rawstring. But the preprocessor is much more advanced here and can have constructs like:

R
"str"

And much more. This patch also adds more test coverage for Rawstrings in the dependencydirectivesscanner.

This was co-authored by Sylvain Audi [email protected] (@sylvain-audi)

Fixes #137648

@tru tru requested a review from jansvoboda11 May 12, 2025 06:41
@tru tru added the clang:tooling LibTooling label May 12, 2025
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels May 12, 2025
@llvmbot
Copy link
Member

llvmbot commented May 12, 2025

@llvm/pr-subscribers-clang

Author: Tobias Hieta (tru)

Changes

The current parser just checks one step back for a R before each string to know that it's a rawstring. But the preprocessor is much more advanced here and can have constructs like:

R
"str"

And much more. This patch also adds more test coverage for Rawstrings in the dependencydirectivesscanner.

This was co-authored by Sylvain Audi <[email protected]> (@sylvain-audi)

Fixes #137648


Full diff: https://github.com/llvm/llvm-project/pull/139504.diff

2 Files Affected:

  • (modified) clang/lib/Lex/DependencyDirectivesScanner.cpp (+36-7)
  • (added) clang/test/ClangScanDeps/raw-strings.cpp (+55)
diff --git a/clang/lib/Lex/DependencyDirectivesScanner.cpp b/clang/lib/Lex/DependencyDirectivesScanner.cpp
index 088d1cc96e3a2..86e860abdbbdc 100644
--- a/clang/lib/Lex/DependencyDirectivesScanner.cpp
+++ b/clang/lib/Lex/DependencyDirectivesScanner.cpp
@@ -206,6 +206,24 @@ static void skipOverSpaces(const char *&First, const char *const End) {
     ++First;
 }
 
+// Move back by one character, skipping escaped newlines (backslash + \n)
+static char previousChar(const char *First, const char *&Current) {
+  assert(Current > First);
+  --Current;
+  while (Current > First + 1 && isVerticalWhitespace(*Current)) {
+    const char PrevChar = *(Current - 1);
+    if (PrevChar == '\\') {
+      Current -= 2; // backslash + (\n or \r)
+    } else if (Current > First + 2 && isVerticalWhitespace(PrevChar) &&
+               PrevChar != *Current && *(Current - 2) == '\\') {
+      Current -= 3; // backslash + (\n\r or \r\n)
+    } else {
+      break;
+    }
+  }
+  return *Current;
+}
+
 [[nodiscard]] static bool isRawStringLiteral(const char *First,
                                              const char *Current) {
   assert(First <= Current);
@@ -215,25 +233,28 @@ static void skipOverSpaces(const char *&First, const char *const End) {
     return false;
 
   // Check for an "R".
-  --Current;
-  if (*Current != 'R')
+  if (previousChar(First, Current) != 'R')
     return false;
-  if (First == Current || !isAsciiIdentifierContinue(*--Current))
+  if (First == Current ||
+      !isAsciiIdentifierContinue(previousChar(First, Current)))
     return true;
 
   // Check for a prefix of "u", "U", or "L".
   if (*Current == 'u' || *Current == 'U' || *Current == 'L')
-    return First == Current || !isAsciiIdentifierContinue(*--Current);
+    return First == Current ||
+           !isAsciiIdentifierContinue(previousChar(First, Current));
 
   // Check for a prefix of "u8".
-  if (*Current != '8' || First == Current || *Current-- != 'u')
+  if (*Current != '8' || First == Current ||
+      previousChar(First, Current) != 'u')
     return false;
-  return First == Current || !isAsciiIdentifierContinue(*--Current);
+  return First == Current ||
+         !isAsciiIdentifierContinue(previousChar(First, Current));
 }
 
 static void skipRawString(const char *&First, const char *const End) {
   assert(First[0] == '"');
-  assert(First[-1] == 'R');
+  //assert(First[-1] == 'R');
 
   const char *Last = ++First;
   while (Last != End && *Last != '(')
@@ -416,6 +437,14 @@ void Scanner::skipLine(const char *&First, const char *const End) {
         continue;
       }
 
+      // Continue on the same line if an EOL is preceded with backslash
+      if (First + 1 < End && *First == '\\') {
+        if (unsigned Len = isEOL(First + 1, End)) {
+          First += 1 + Len;
+          continue;
+        }
+      }
+
       // Iterate over comments correctly.
       if (*First != '/' || End - First < 2) {
         LastTokenPtr = First;
diff --git a/clang/test/ClangScanDeps/raw-strings.cpp b/clang/test/ClangScanDeps/raw-strings.cpp
new file mode 100644
index 0000000000000..5fda4a559c9e3
--- /dev/null
+++ b/clang/test/ClangScanDeps/raw-strings.cpp
@@ -0,0 +1,55 @@
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+// RUN: sed -e "s|DIR|%/t|g" %t/cdb.json.in > %t/cdb.json
+
+//--- cdb.json.in
+[{
+    "directory": "DIR",
+    "command": "clang -c DIR/tu.c -o DIR/tu.o -IDIR/include",
+    "file": "DIR/tu.c"
+}]
+//--- include/header.h
+//--- include/header2.h
+//--- include/header3.h
+//--- include/header4.h
+//--- tu.c
+#if 0
+R"x()x"
+#endif
+
+#include "header.h"
+
+#if 0
+R"y(";
+#endif
+#include "header2.h"
+
+#if 0
+//")y"
+#endif
+
+#if 0
+R"y(";
+R"z()y";
+#endif
+#include "header3.h"
+#if 0
+//")z"
+#endif
+
+#if 0
+R\
+"y(";
+R"z()y";
+#endif
+#include "header4.h"
+#if 0
+//")z"
+#endif
+
+// RUN: clang-scan-deps -compilation-database %t/cdb.json -mode preprocess | FileCheck %s
+// RUN: clang-scan-deps -compilation-database %t/cdb.json -mode preprocess-dependency-directives | FileCheck %s
+// CHECK: tu.c
+// CHECK-NEXT: header.h
+// CHECK-NEXT: header3.h
+// CHECK-NEXT: header4.h

@github-actions
Copy link

github-actions bot commented May 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

tru added 3 commits June 23, 2025 10:04
The current parser just checks one step back for a R before each string
to know that it's a rawstring. But the preprocessor is much more
advanced here and can have constructs like:

R\
"str"

And much more. This patch also adds more test coverage for Rawstrings in
the dependencydirectivesscanner.

This was co-authored by Sylvain Audi <[email protected]>

Fixes llvm#137648
This is so that we can use it in the dependency scanner without duplicating the logic there.
As suggested in the review, use Lexer::getEscapedNewLineSize() instead
of implementing our own logic for it.
@tru tru force-pushed the scan_deps_rawstrings branch from b722c2e to 63ca116 Compare June 23, 2025 09:11
@tru
Copy link
Collaborator Author

tru commented Jun 23, 2025

Sorry for the delay. I have addressed both your comments now @benlangmuir - reworked it to use Lexer::getEscapedNewLineSize() and removed the assertline. I also expanded the test coverage.

@tru tru merged commit 1fb786e into llvm:main Jun 27, 2025
7 checks passed
@tru tru deleted the scan_deps_rawstrings branch June 27, 2025 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:tooling LibTooling clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing dependency from clang-scan-deps after b768a8c1db85b9e84fd8b356570a3a8fbe37acf6

3 participants